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O ■ Abstract 



Discovering causal relationships is a hard task, often hindered by the need for 
intervention, and often requiring large amounts of data to resolve statistical un- 
certainty. However, humans quickly arrive at useful causal relationships. One 
possible reason is that humans extrapolate from past experience to new, unseen 
' situations: that is, they encode beliefs over causal invariances, allowing for sound 

generalization from the observations they obtain from directly acting in the world. 
Here we outline a Bayesian model of causal induction where beliefs over compet- 
^ ing causal hypotheses are modeled using probability trees. Based on this model, 

we illustrate why, in the general case, we need interventions plus constraints on 
our causal hypotheses in order to extract causal information from our experience. 
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> ■ 1 Introduction 

00 : 

' A fundamental problem of statistical causality is the problem of causal inductiorQ, namely, the 

generalization from particular instances to abstract causal laws |4, 5]. For instance, how can you 
conclude that it is dangerous to ride a bike on ice from a bad slip fall on wet floor? 

In this work, we are concerned with the following problem: how do we determine from experience 
whether "X — > Y and U — > V" or "y — > X and V — > C/"? That is, which of the two causal 
hypotheses over X, Y, U and V is correct, 
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even in the case when both models represent identical joint distributions? Furthermore, if we collect 
evidence supporting the claim "X Y", how do we extrapolate this to the (yet unseen) situation 
"[/ — > The main challenge in this problem is that the hypothesis, say H, is a random variable 
that controls the very causal structure. That is, a more accurate graphical representation would be 
the model: 



meta-level 



which cannot be analyzed using the framework of graphical models alone because the random vari- 
able H operates on a meta-level of the graphical model over X, Y, U and V. 



'For a thorough treatment of non-causal induction, refer to @|. 
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Figure 1 : (Left) A device with a green and a red light bulb. A switch allows controlling the state 
of the green light: either "on", "off" or "undisturbed". (Right) A second device having a green 
spinner and a red spinner, both of which can either lock into a horizontal or vertical position. The 
two devices are connected through a cable, establishing thus a relation among their randomizing 
mechanisms. 



In this work these difficulties are overcome by using a probability tree to model the causal structure 
over the random events [9]. Probability trees can encod^l alternative causal realizations, and in 
particular alternative causal hypotheses. All random variables are of the same type — no distinctions 
between meta-levels are needed. Furthermore, we define interventions ||7tl on probability trees so as 
to predict the statistical behavior after manipulation. We then show that such a formalization leads 
to a probabilistic method for causal induction that is intuitively appealing. 

2 Causal Induction in Probability Trees 

Imagine we are given a device with two light bulbs, one green (X) and one red (Y), whose states 
obey a hidden mechanism that correlates them positively. Moreover, the box has a switch that allows 
us controlling the state of the green bulb: we can either leave it undisturbed, or we can intercept the 
mechanism by turning the light on or off as we please (Figure [T] left device). We encode the "on" 
and "off" states of the green light as X ^ x and X ^ respectively. Analogously, Y = y and 
Y — denote the "on" and "off" states of the red light. We ponder the explanatory power of two 
competing hypotheses: either "green causes red" (H = h) or "red causes green" (H — -i/i). 

2.1 Representation 

Assume that the probabilities governing the realization of H, X and Y are as detailed in Figure [2^. 
In this tree, each (internal) node is interpreted as a causal mechanism; hence a path from the root 
node to one of the leaves corresponds to a particular sequential realization of causal mechanismfl 
The logic underlying the structure of this tree is self-explanatory: 

1. Causal precedence: A node causally precedes its descendants. For instance, the root node 
corresponding to the sure event Q, causally precedes all other nodes. 

2. Resolution of variables: Each node resolves the value of a random variable. For instance, 
given the node corresponding to H = h and X — -^x, either Y — y will happen with 
probability P{y\h, ^x) — jOiY — ^y with probabihty P{^y\h, -ix) = |. 

3. Heterogeneous order: The resolution order of random variables can vary across different 
branches. For instance, X precedes Y under H — h, but Y precedes X under H = -^h. 
This allows modeling different causal hypotheses. 

While the probability tree represents our subjective model explaining the order in which the random 
values are resolved, it does not necessarily correspond to the temporal order in which the events 
are revealed to us. So for instance, under hypothesis H ^ h, the value of the variable Y might be 
revealed before X, even though X causally precedes Y; and the hypothesis H, which precedes both 
X and Y, is never observed. 



^Conditional independencies are also captured within a probability tree @, Chapter 8.2]. 
^Note that the set of paths is the sample space of the experiment's probability space. 
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Figure 2: a) The probability tree representing the statistics of the device with two Hghts. The proba- 
bility of a realization (written under the leaves) is calculated by multiplying the probabilities starting 
from the root until a leave is reached. Note that the two hypotheses are statistically indistinguishable, 
b) The probability tree resulting from (a) after setting X = x. 



2.2 Interventions 



Suppose we observe that both lights are on. Have we learned anything about their causal depen- 
dency? A brief calculation shows that this is not the case because the posterior probabilities are 
equal to the prior probabilities: 

p(h\o: y) = P{y\h,x)P{x\h)Pih) ^ l-l-I ^ 1 ^ p(^. 

^ ' '^^ Piy\h,x)P{x\h)P(h) + P{xhh,y)P{y\^h)P{^h) |.i.i + 3.i.i 2 

This makes sense intuitively, because by just observing that the two lights are on, it is statistically 
impossible to tell which one caused the other. Note how the factorization of the likelihood P{x, y\H) 
depends on whether H = h or H = -i/i. How do we extract causal information then? To answer 
this question, we make use of a crucial insight of statistical causality: 

To obtain new causal information from statistical data, old causal information 
needs to be supplied (paraphrased as "no causes in, no causes out" or "to find 
out what happens when you kick the system, you have to kick the system" 1 1]). 

Thus, we now repeat our experiment, but this time we turn on the green light {X — x). We reflect 
this choice by changing all the mechanisms that resolve the random variable X, placing all the 
probability mass on the outcome X — x (see FigurelJJ?). Assume that we subsequently observe that 
the second light is on. Then, the posterior probabilities are 

p., I- s ^ P{y\h,x)P{x\h)P{h) ^ l-l-^ ^ 3 

^ ' P(y\h,x)P{x\h)P{h) + Pix\^h,y)Piy\^h)P(^h) + 5' 

where x is Pearl's notation to indicate a causal intervention of X. Since P{h) > P{h\x, y), we have 
gathered evidence favoring the hypothesis "green causes red". This was only possible because our 
intervention introduced a statistical asymmetry among the two hypotheses that did not exist before. 



2.3 Extrapolation 

Let us now connect a second device to the first one (FigurelT] right device). This device carries two 
spinners, a green (U) and a red one (V). A hidden randomizing mechanism chooses their orien- 
tations (either horizontal or vertical) independently from the state of the colored lights. However, 
the connection and the mysterious color coding suggest that there must be a relation between the 
two randomizing mechanisms. Hence, we impose that the combined system either follows the law 
"green causes red" or "red causes green" — intentionally excluding the cases "X Y and U ^ V" 
and "X y and [/ ^ V". 

The probability tree over the random variables X, Y, U and V extends the probability tree from 
Figure by appending sub-trees over U and V having the restriction that the nodes resolving U 
precede the nodes resolving V under hypothesis H — h, and that the nodes resolving V precede the 
nodes resolving U in the case H ~ -^h. 
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Note however, that for this new tree, the posterior probability over the hypothesis "green causes 
red" given that we turned on the green Ught and saw the red bulb lighting up is identical to the 
previous tree, namely P{h\x,y) — |. The restriction we have imposed over the possible causal 
hypotheses has enabled us extrapolating causal information from our experience with X and Y to 
the yet unobserved variables U and V. This extrapolation would not have been possible if we had 
kept all four causal hypotheses. Hence, in the general case, causal extrapolation rests on constraints 
on our causal hypotheses. 

3 Concluding Remarks 

The problem of causal induction has been addressed relatively recently by the statistics and machine 
learning community, mainly under the context of graphical models fSl |3 S S [13. Ej • This has led 
to the development of many algorithms that propose a suitable causal graphical model explaining 
the data. Many of these algorithms rely on independence assumptions, and hence naturally they 
proceed by exploiting the independence relations found in the data to construct a causal model. 

This work outlines a general method for causal induction that is Bayesian in nature and does not rely 
on independence assumptions. It is based on the idea of combining probability trees [S] with inter- 
ventions [7] for predicting the behavior of a manipulated system with multiple causal hypotheses. 
We have seen that both the interventions and the (constraints on the) causal hypotheses introduce 
statistical asymmetries that permit the extraction and extrapolation of causal information. Of course, 
this means that the amount and the forms of causal relations that we can discover are determined 
(a) by our constraints on the set of causal hypotheses and (b) by the interventions that we are al- 
lowed to apply to the system (and essentially, to our hypotheses). In a sense, one could say that we 
are "imprinting our own causal laws onto our experience". However, this raises more fundamental 
questions that we have not addressed here: where do these constraints on our causal hypotheses 
come from and what logic do they obey? 
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